\[~\]

Problem statement

\[~\]

We are provided with some preprocessed data coming from Autism Brain Image Data Exchange Project. This dataset is partitioned in two subsets: the Autism Spectrum Disorder (ASD) and the Typically Developed (TD) sets.

\[~\]

\[~\]

As shown in the image above, each subset contains the records of 12 patients. A sample of 145 records is associated to each person. Formally, this sample can be treated as an IID sample defined as:

\[ D^{(i)}_n = \{X_1, ..., X_n\} \sim f_X(\cdot) \hspace{1em} \text{ for a certain } f_X \text{, having } n=145 \]

Each \(X_i\) is defined as a random vector \(X_i = [X_i(1), ..., X_i(\mathtt{D})]^T \in \mathbb{R}^\mathtt{D}\), where \(\mathtt{D}=116\) is the number of the features, i.e the number of the Regions of Interest (ROIs) of the person’s brain. We also assume that each patient within a group can be himself treated as an independent sample from a certain \(f_D(\cdot)\). Thus, moving bottom-up in the data tree, each group formally corresponds to:

\[ D_{12} = \{D^{(1)}_n, D^{(2)}_n, ..., D^{(12)}_n\} \]

It is worth to note that the indipendence assumptions at the two levels of the data tree are quite crucial. Thanks to them, we have different ways of pooling the data:

  1. Assume \(D = \{X^{(1)}_1, ..., X^{(1)}_n, ..., X^{(12)}_1, ..., X^{(12)}_n\}\), that is ignoring the fact that each \(X^{(i)}\) comes from different people.
  2. Consider the estimator \(\hat{D}_n = h\big(\{D^{(1)}_n, ..., D^{(12)}_n\}\big)\) for a certain function \(h\).

From this point on, according to [1], we follow the second approach, computing the correlation matrix \(\big(\hat{\rho}_{j,k}\big)^{(i)} \in [-1,1]^{\mathtt{D}\times \mathtt{D}}\) associated to each \(D^{(i)}_n\), \(i=1,...,12\), and taking the mean of them; by the IID property of \(\{D^{(1)}_n, ..., D^{(12)}_n\}\), we have that \(\Big\{\big(\hat{\rho}_{j,k}\big)^{(i)}\Big\}_{i=1}^{12}\) is still independent, being function of each \(D^{(i)}_n\). This choice has many advantages, which will be pointed out further in the discussion.

As pointed out in [2], the Pearson correlation coefficient is non-additive, then it is not mathematically meaningful to take the mean of them as they are. Hence, before averaging, we apply the Fisher Z-transform in order to smoothly map the interval \([-1,1]\) into \(\mathbb{R}\):

\[ \Big\{\hat{Z}^{(i)}_{j,k}\Big\}_{i=1}^{12} = \Big\{atan\left(\hat{\rho}^{(i)}_{j,k}\right)\Big\}_{i=1}^{12} \hspace{1em} \text{where} \hspace{.5em} j,k \in \{1,...,\mathtt{D}\},\hspace{.4em} j\neq k \]

Recall that, for each \(\hat{Z}^{(i)}_{j,k}\), holds:

\[ \hat{Z}^{(i)}_{j,k} \hspace{.4em} \dot{\sim} \hspace{.4em} N_1\Big(atan(\hat{\rho}^{(i)}_{j,k}), \frac{1}{n - 3}\Big), \hspace{1em} n=145 \]

Finally, we take:

\[ \overline{Z}_{12}(j,k) = \frac{1}{12}\sum_{i=1}^{12}\hat{Z}^{(i)}_{j,k} \text{ ,} \]

remembering that \(\hat{Z}^{(i)}_{j,k}\) are \(IID\), being function of \(\hat{\rho}^{(i)}_{j,k}\).

Here the convenience of this approach becomes evident; indeed, being a sample mean, the estimator is unbiased under the assumption of an (unknown) identical distribution \(f_D\); by the CLT, the sample mean is asymptotically normal so that:

\[ \frac{\overline{Z}_{12}(j,k) - Z_{j,k}}{\hat{\sigma}_{j,k}/\sqrt{12}} \hspace{.4em} \dot{\sim} \hspace{.4em} N(0,1), \] where \(\hat{\sigma}_{j,k} = \frac{1}{\sqrt{145 - 3}}\). In building the asymptotic confidence interval for \(Z_{j,k}\), these results reveal pretty relevant.

Before moving forward, for the sake of completeness, we discuss the other possible pooling approaches.
Concatenate the data related to all the patients within a group leads to:

\[ \hat{Z}^{(i)}_{j,k} \hspace{.4em} \dot{\sim} \hspace{.4em} N_1\Big(atan(\hat{\rho}^{(i)}_{j,k}), \frac{1}{n - 3}\Big), \hspace{1em} n=145 \cdot 12, \]

which is statistically more efficient than the chosen estimator. Nevertheless, in building the asymptotic confidence interval problems arise: if the distribution assumptions don’t strongly hold, this approach is less robust than the chosen one.
Eventually, taking the average of the patients before computing the correlation coefficients would yield to a more complicated and potentially less accurate solution.

\[~\]

Data handling and transformation

\[~\]

Before going any further, we verify the distribution of the average of the measurements for each patient inside a group:

\[~\]

## [1] "ASD group"

## [1] "TD group"

\[~\]

Looking at the average measurements of each patient, it can be noted that patients named caltech have values in a different scale. This doesn’t affect the correlation between the ROIs of a patient as long as they are homogeneous. On the contrary, it is an issue if the raw data are pooled together before computing the correlation between ROIs.

\[~\]

After loading the data, we define a function called get.cor.ROIs() in order to return a dataframe of correlations between all possible combinations between the ROIs. Inside this function we transform the correlation values with the Fisher Z-transform as defined in the previous section.

\[~\]

# define a function in order to catch each person and return the correlation matrix between ROIs
get.cor.ROIs <- function(...){
  args <- list(...)
  
  if(!is.null(args[["person"]])) {
    args$frame <- args[["group"]][[args[["person"]]]]
  }
  
  n   <- ncol(args[["frame"]])
  nms <- names(args[["frame"]])
  
  # takes efficiently the combinations, this is useful for large dataset!
  V1  <- rep(nms[1:(n-1)], seq(from=n-1, to = 1, by = -1)) 
  V2  <- unlist(lapply(1:(n-1), function(i)(nms[(i+1):n])))
  
  corr.ROIs <- data.frame(ROI_1=V1, ROI_2=V2) # create the correlation in which I will insert the values
  
  corr.ROIs$z.pearson.corr <- apply(corr.ROIs, 1, function(row) {
    atanh(cor(args[["frame"]][row["ROI_1"]], args[["frame"]][row["ROI_2"]])) # takes the sets of columns; apply the Z-transform to the correlation matrix
  })
  
  return(corr.ROIs)
}

\[~\]

Recall, for these 12 people of each group, the function defined above. We put into the call function the type of group and the corresponding i-th person.

\[~\]

# create the matrix correlations for all patients
for(person in names(asd_sel)) 
  assign(paste("z.trans.", person, sep=""), get.cor.ROIs(group = asd_sel, person = person))
for(person in names(td_sel)) 
  assign(paste("z.trans.", person, sep=""), get.cor.ROIs(group = td_sel, person = person))

\[~\]

After computing the correlation dataframes, for all the patients within each group, we pool the data as hinted at the beginning.

\[~\]

# create a unique matrix for each group
unique.matrix <- function(group) {
  mtx <- z.trans.trinity_0050234[c("ROI_1", "ROI_2")] # create matrix with combinations
  
  mtx$cor.mean <- apply(mtx, 1, function(row) {
    values <- c()
    for(person in names(group)) {
       frame <- get(paste("z.trans.", person, sep="")) # match the address of the string with the real variable
       elem <- frame[(frame[["ROI_1"]] == row["ROI_1"]) & (frame[["ROI_2"]] == row["ROI_2"]), "z.pearson.corr"] # select the correlation
       values <- c(values, elem)
    }
    mean(values) # take the mean!
  })
  
  return(mtx)
}

\[~\]

We finally get the average correlation matrix for the groups ASD and TD:

\[~\]

# call the creation of unique matrix
asd_sel.dt <- unique.matrix(asd_sel); head(asd_sel.dt)
##   ROI_1 ROI_2    cor.mean
## 1  2001  2002  0,51188381
## 2  2001  2101  0,02486222
## 3  2001  2102 -0,21203874
## 4  2001  2111 -0,05246567
## 5  2001  2112 -0,19052649
## 6  2001  2201  0,14638295
td_sel.dt <- unique.matrix(td_sel); head(td_sel.dt)
##   ROI_1 ROI_2     cor.mean
## 1  2001  2002  0,416005628
## 2  2001  2101  0,086652336
## 3  2001  2102 -0,334109327
## 4  2001  2111  0,007217533
## 5  2001  2112 -0,150770771
## 6  2001  2201  0,164410598

\[~\]

Some plots

\[~\]

Before computing the estimated association graphs, we show some plots to investigate the properties of the averaged correlation matrices associated to ASD and TD. According to the assignment introduction, two ROIs are strictly defined as co-activated if they are positively correlated: the greater the Pearson coefficient, the stronger the co-activation between them. Hence, the heatmaps below show the 80 most positively correlated ROIs within the two groups:

\[~\]

\[~\]


Image 1. AAL atlas.






Image 2. Table of ROIs’ IDs and names.

\[~\]

Looking at the plots and at the images above, it is evident that the majority of the co-activated ROIs have near positions in the brain. As an example, the ROIs named 5001 and 5002 are strongly positively correlated in both the groups.

\[~\]

Moving foreward, the histograms below show the distribution of the Z-scaled correlation coefficients of the groups.

\[~\]

\[~\]

As expected, the values are approximately distributed as a Gaussian and the curve is slighly skewed on the left.

\[~\]

Association graphs

\[~\]

Now, we find the quantile value for these two groups. Since we are working in the Z-scale, the threshold is also in the Z-scale.

\[~\]

z.t <- apply(rbind(asd_sel.dt["cor.mean"], td_sel.dt["cor.mean"]), 2, quantile, probs=0.8) 
paste("the threshold at the 80th percentile for the subjects is: ", round(as.numeric(z.t), 3), sep = "")
## [1] "the threshold at the 80th percentile for the subjects is: 0,125"

\[~\]

\[~\]

We compute the confidence interval for each \(\overline{Z}_{12}(j,k)\). Starting from:

\[~\]

\[ \frac{\overline{Z}_{12}(j,k) - Z_{j,k}}{\hat{\sigma}_{j,k} / \sqrt{12}}, \\ \text{ where } \hat{\sigma}_{j,k} = \frac{1}{\sqrt{145 - 3}} \]

\[~\]

and applying the Bonferroni’s correction, this is one of the typical method used for the problem of multiple comparisons. Let \(\textit{H}_{1}, ..., \textit{H}_{m}\) a collection of null hypotheses and the \(\rho_{j, k}\) correlations values that we would want to compare to the alpha level (rejection region). The family wise error rate (FWER) is the probability to reject at least one true hypothesis \(\textit{H}_{0}\), in this way I can commit the first type of error \(\alpha\). In our case as we can see below, all is determined by the m (number of hypotheses):

\[~\]

\[ \frac{\alpha}{m}, \hspace{1em} \text{ where } \hspace{.5em} m = \begin{pmatrix} \mathtt{D} \\ 2 \end{pmatrix}, \hspace{.5em} \mathtt{D}=116 \]

\[~\]

we end up with:

\[~\]

\[ C_{12}^{Z(j,k)}\Big(\frac{\alpha}{m}\Big) = \bigg[\overline{Z}_{12}(j,k) \mp z_{\alpha/2m} \cdot \frac{\hat{\sigma}_{j,k}}{\sqrt{12}}\bigg] \]

\[~\]

conf.int <- function(dt, m = 1, alpha = 0.05) { # m : Family-Wise Error Rate correction (Bonferroni)
  # Asymptotic variance
  n.samp <- 145 # number of IID samples on which we computed the correlation matrix
  sigma <- 1 / sqrt(n.samp - 3) # standard deviation of the (j,k)-th Z-transformed Pearson coefficent 
  n.p <- 12 # number of people in each of the two groups
  se <- sigma / sqrt(n.p) # standard error of the estimator Z12
  
  # Confidence interval
  cint <- sapply(dt$cor.mean, function(z.i) {
    list(confidence_interval = c(lb = z.i - se * qnorm(1 - alpha / (2*m)), 
                                 ub = z.i + se * qnorm(1 - alpha / (2*m))))
  })
  return(cint)
}

\[~\]

Finally, the confidence intervals for the first three ROI couples are:

\[~\]

## Compute the confidence interval
m <- (116 * 115) / 2 # number of intervals
asd_sel.dt$cint <- conf.int(asd_sel.dt, m = m); asd_sel.dt[1:3,]
##   ROI_1 ROI_2    cor.mean                    cint
## 1  2001  2002  0,51188381    0,4033775, 0,6203901
## 2  2001  2101  0,02486222 -0,08364405, 0,13336849
## 3  2001  2102 -0,21203874  -0,3205450, -0,1035325
td_sel.dt$cint <- conf.int(td_sel.dt, m = m); td_sel.dt[1:3,]
##   ROI_1 ROI_2    cor.mean                    cint
## 1  2001  2002  0,41600563    0,3074994, 0,5245119
## 2  2001  2101  0,08665234 -0,02185393, 0,19515860
## 3  2001  2102 -0,33410933  -0,4426156, -0,2256031

\[~\]

Since the threshold is \(z_t=\) 0.125 and two ROIs \(j,k\) are considered connected if \([-z_t, z_t] \cap C_{12}^{Z(j,k)}\Big(\frac{\alpha}{m}\Big)=\emptyset\), we expect to have a connection in the estimated graph between 2001 and 2002, whereas not to have a connection between 2001 and 2101.

\[~\]

Just as an example, we visualize the confidence interval for \(Z(j=2001,k=2002)\) in the ASD and TD groups:

\[~\]

At this point, we compute the estimated association graphs, applying the condition just mentioned. To be more precise, the (conservative) null hypothesis \(H_0^{j,k}\) states that, under a certain threshold, the (Z-transformed) correlation coefficient \(z_{j,k}\) is not relevant and thus ROIs \(j,k\) are not correlated. The null hypothesis is hence rejected if and only if, at a certain level \(\alpha\) of confidence, the correlation value is not in the threshold interval. In this case, ROIs \(j,k\) are connected with an edge.

\[~\]

## Estimate the adjacency matrix given the Z-transformed Pearson correlation coefficient 
get.est.adj.mat <- function(dt, dec.rule, t) {
  # create the adj matrix 
  nameVals <- sort(unique(unlist(dt[1:2]))) # set up storage matrix, get names for row and columns
  
  # construct 0 matrix of correct dimensions with row and column names 
  adj_mat <- matrix(0, length(nameVals), length(nameVals), dimnames = list(nameVals, nameVals))
  
  # fill in the matrix with matrix indexing on row and column names 
  adj_mat[as.matrix(dt[c("ROI_1", "ROI_2")])] <- 0
  
  # put edge according to the decision rule
  for(idx in rownames(dt)){
    if( dec.rule(dt, idx, t) ) {
      adj_mat[as.character(dt[idx, "ROI_1"]), as.character(dt[idx, "ROI_2"])] <- 1 
    }
  }
  
  return(adj_mat)
}


## check if the two intervals int1 and int2 are overlapping
are.joint <- function(int1, int2) return((int1[1] <= int2[2]) && (int2[1] <= int1[2]))


## check the simple threshold condition
check.threshold <- function(dt, idx, t) {
  val <- abs(dt[idx, "cor.mean"])
  return(val >= t)
}


## check the confidence interval condition
check.conf.int <- function(dt, idx, t) {
  c.int <- dt[idx, "cint"]$confidence_interval
  t.int <- c(-t, t)
  return(!are.joint(c.int, t.int))
}

\[~\]

# Adjacency matrix of the estimated graph with the Bonferroni's correction
adj_mat_asd_bonf <- get.est.adj.mat(asd_sel.dt, check.conf.int, z.t) 
adj_mat_td_bonf <- get.est.adj.mat(td_sel.dt, check.conf.int, z.t) 

# Estimated graph
G.asd.bonf <- graph_from_adjacency_matrix(adj_mat_asd_bonf, mode = "undirected")
G.td.bonf <- graph_from_adjacency_matrix(adj_mat_td_bonf, mode = "undirected")

\[~\]

\[~\]

\[~\]

The two estimated association graphs show that there are many links between the ROIs even with the strict Bonferroni’s correction. In the following, we visualize the difference graphs, i.e. the graphs given by the difference between the the edge sets \(\mathcal{E}^{ASD} \setminus \mathcal{E}^{TD}\) and viceversa.

\[~\]

\[~\]

\[~\]

From the graphs above, it is possible to empirically see that many ROIs result connected in one group and disconnected in the other. Before providing the number of edges of the graphs shown so far, we provide a visualization of the estimated graphs computed taking into account only the condition \(\bar{z}_{j,k} \in [-z_t, z_t]\) and taking into account the confidence interval without the Bonferroni’s correction.

\[~\]

\[~\]

The association graphs taking into account the confidence interval without the Bonferroni’s correction are:

\[~\]

\[~\]

\[~\]

As anticipated, the number of edges of each estimated ASD graphs is:

\[~\]

## [1] "Number of edges with Bonferroni's correction:  1073"
## [1] "Number of edges with Bonferroni's correction of (G(ASD)-G(TD)): 300"
## [1] "Number of edges considering only the threshold zt: 2869"
## [1] "Number of edges without Bonferroni's correction:  1893"

\[~\]

While the number of edges of the estimated TD graph:

\[~\]

## [1] "Number of edges with Bonferroni's correction:  1175"
## [1] "Number of edges with Bonferroni's correction (G(TD)-G(ASD)): 402"
## [1] "Number of edges considering only the threshold zt: 2996"
## [1] "Number of edges without Bonferroni's correction:  1971"

\[~\]

As expected, if we don’t take into account the multiplicity adjustment, i.e. \(\frac{\alpha}{m}\), we end up with a grater number of edges in the estimated graphs. This naturally comes from the fact that the aim of the multiplicity adjustment is to reduce the number of false discoveries, i.e. false edges. Just to notice, a confidence interval \(C(\frac{\alpha}{m})\) is wider than \(C(\alpha)\) (cfr. \(C(0.05)\) vs \(C(0.2)\)), hence it is less likely that the aforementioned condition of rejection of the null hypothesis is verified.

\[~\]

In the end, we plot the degree distribution of the estimated association graphs with the Bonferroni’s adjustment:

\[~\]

\[~\]

Even though the most frequent degree is the same for both groups, there are some differences in the degree distribution of the two graphs. From this, we deduce that some ROIs have different co-activation relations between ASD and TD patients.

\[~\]

[BONUS] Bootstrap procedure and the difference graph

\[~\] The aim of this final section is to estimate the difference graph setting up a non-parametric bootstrap simulation. The difference graph between ASD and TD groups is defined as:

\[ \Big|\Delta_{j,k}\Big| = \Big|{\rho}^{ASD}_{j,k} - {\rho}^{TD}_{j,k} \Big|, \text{ having } j,k \in \{1,...,116\}, \text{ } j\neq k \]

Actually, we still work in the Z-scale, then the previous formula becomes:

\[ \Big|\Delta_{j,k}\Big| = \Big|{z}^{ASD}_{j,k} - {z}^{TD}_{j,k} \Big|, \text{ having } j,k \in \{1,...,116\}, \text{ } j\neq k \] \[~\]

In order to run the non-parametric bootstrap simulation, we are going to concatenate the measurements of all the patients within a group, obtaining a “patient archetype”.
Before doing this, as pointed out at the beginning of the discussion, caltech patients in the two groups are discarded, in order to have values in a homogeneous measure unit.

\[~\]

## Concatenate the patients into one patient
concat.data <- function(dataset) {
  out <- dataset[[1]]
  for(i in 2:length(dataset)) out <- rbind(out, dataset[[i]])
  return(out)
}

asd_fusion <- concat.data(asd_sel)
td_fusion <- concat.data(td_sel)

\[~\]

As a starting point in building the bootstrap confidence interval, we compute the true (in the bootstrap world) correlation difference:

\[~\]

# Considering the two main datasets for these two groups, obtain the difference graph
diff.dt <- asd_sel.dt[, 1:2] # take the combinations between the ROIs
diff.zcor.mat <- atan(cor(asd_fusion)) - atan(cor(td_fusion))
diff.dt$cor.mean <- as.vector(diff.zcor.mat[upper.tri(diff.zcor.mat, diag = F)]) 

\[~\]

For the choice of the threshold, we look at the ECDF plot shown below. Following [3], since we are testing \(\Big|{z}^{ASD}_{j,k} - {z}^{TD}_{j,k}\Big| \geq z_t\), we assume as a conservative threshold the \(80^{th}\) percentile of this difference: within \([-z_t, z_t]\), \(z_{j,k}\) is considered the same in both the groups.

\[~\]

z.diff.t <- apply(diff.dt["cor.mean"], 2, quantile, probs=0.8) 
paste("The threshold at the 80th percentile for the difference is: ", round(as.numeric(z.diff.t), 3), sep = "")
## [1] "The threshold at the 80th percentile for the difference is: 0,082"

\[~\]

\[~\]

We visualize the estimated difference graph, taking into account only the threshold just computed:

\[~\]

\[~\]

We look at the distribution of the Z-transformed correlation difference:

\[~\]

\[~\]

Being a difference between approximately normal distributions, the distribution of the Z-transformed correlation difference is also approximately normal. Then, computing the normal interval of this difference is meaningful.

\[~\]

The following function implements the non-parametric bootstrap procedure:

\[~\]

## Non-parametric bootstrap
non.parametric.bootstrap <- function(data1, data2, B = 1000) {
  n <- 116 # number of ROIs
  p.boot <- array(NA, dim = c(B, n, n)) # replicate a dataframe with dimension (B, n, n)
  n.sample1 <- nrow(data1)
  n.sample2 <- nrow(data2)
  
  for(b in 1:B) {
    idx1 <- sample(1:n.sample1, replace = TRUE) # take n samples with replacement
    idx2 <- sample(1:n.sample2, replace = TRUE) # take n samples with replacement
    p.boot[b,,] <- atan(cor(data1[idx1,])) - atan(cor(data2[idx2,])) # inserting the i-th estimated pearson's correlation
  }
  return(p.boot)
}

\[~\]

We run the bootstrap procedure 1000 times and we compute the normal interval for the correlation difference.

\[~\]

p.boot <- non.parametric.bootstrap(asd_fusion, td_fusion, B = 1000) # getting the bootstrapped correlations samples

# Normal Bootstrap CI
alpha <- 0.05 # 95%, confidence level
m <- (116 * 115) / 2 # the number of hypotheses
mean.boot <- apply(p.boot, 2:3, mean) # estimated mean
se.boot <- apply(p.boot, 2:3, sd) # estimated standard deviation

diff.dt$mean.boot <- as.vector(mean.boot[upper.tri(mean.boot, diag = F)]) # inserting the mean.boot into the difference dataframe
diff.dt$se.boot <- as.vector(se.boot[upper.tri(se.boot, diag = F)]) # inserting the se.boot into the difference dataframe

diff.dt$cint <- sapply(1:nrow(diff.dt), function(idx) { # inserting the estimated normal CI boostrapped
    list(confidence_interval = c(lb = diff.dt$cor.mean[idx] - diff.dt$se.boot[idx] * qnorm(1 - alpha / (2*m)),
                                 ub = diff.dt$cor.mean[idx] + diff.dt$se.boot[idx] * qnorm(1 - alpha / (2*m))))
})

\[~\]

The maximum bootstrap standard error and bias are:

##      MAX_SE    MAX_BIAS 
## 0,049100745 0,004501492

\[~\]

As before, let \(C^{\Delta_{j,k}}_n(\alpha)\) being the normal non-parametric confidence interval for the difference \(\Delta_{j,k}\) between the Z-transformed correlation coefficients of the ROIs \(j,k \in \{1,...,116\}\). The estimated difference graph \(\mathcal{\hat{G}}^\Delta(z_t)\) has an edge between nodes \(j,k\) if and only if:

\[ [-z_t, z_t] \cap C^{\Delta_{j,k}}_n(\alpha) = \emptyset \]

\[~\]

Hence, the resulting difference graph is:

\[~\]

\[~\]

The number of edges of the estimated difference graph is:

\[~\]

## [1] "Number of edges with Bonferroni's correction:  114"
## [1] "Number of edges considering only the threshold zt:  2633"

\[~\]

Again, let’s look at the degree distribution of this estimated graph:

\[~\]

\[~\]

Considering the results, under the conditions discussed so far, there are some ROIs’ co-activation differences between ASD and TD groups. In the end, it is interesting to note the difference between \(\mathcal{G^\Delta}\) estimated in this section and the one estimated in the previous section: different procedures led to different estimations.




References

\[~\]

[1] David M. Corey, William P. Dunlap & Michael J. Burke (1998) Averaging Correlations: Expected Values and Bias in Combined Pearson rs and Fisher’s z Transformations, The Journal of General Psychology, 125:3, 245-261, DOI: 10.1080/00221309809595548

[2] Garcia, Edel. (2016). The Self-Weighting Model Tutorial: Part 1.

[3] https://www.xaprb.com/blog/2015/11/07/setting-thresholds-with-quantiles/

\[~\]